AITopics | inverse stability

Neural network training is usually accomplished by solving a non-convex optimization problem using stochastic gradient descent. Although one optimizes over the networks parameters, the main loss function generally only depends on the realization of the neural network, i.e. the function it computes. Studying the optimization problem over the space of realizations opens up new ways to understand neural network training. In particular, usual loss functions like mean squared error and categorical cross entropy are convex on spaces of neural network realizations, which themselves are non-convex. Approximation capabilities of neural networks can be used to deal with the latter non-convexity, which allows us to establish that for sufficiently large networks local minima of a regularized optimization problem on the realization space are almost optimal.

neural network, parametrization, relu activation function, (11 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.59)

Add feedback

How degenerate is the parametrization of neural networks with the ReLU activation function?

Dennis Maximilian Elbrächter, Julius Berner, Philipp Grohs

Neural Information Processing SystemsOct-1-2025, 23:41:21 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, neural network, (15 more...)

Neural Information Processing Systems

Country:

Europe > Austria > Vienna (0.15)
North America > United States (0.14)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

04115ec378e476c56d19d827bcf8db56-AuthorFeedback.pdf

Neural Information Processing SystemsOct-1-2025, 23:41:06 GMT

architecture, artificial intelligence, multiple output unit, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.54)

Add feedback

b0ae046e198a5e43141519868a959c74-Paper-Conference.pdf

Neural Information Processing SystemsAug-17-2025, 19:15:36 GMT

artificial intelligence, identifiability, machine learning, (20 more...)

Neural Information Processing Systems

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.05)
North America > United States > Arizona > Maricopa County > Scottsdale (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > China (0.04)

Genre: Research Report (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science (0.93)

Add feedback

Reviews: How degenerate is the parametrization of neural networks with the ReLU activation function?

Neural Information Processing SystemsJan-21-2025, 06:53:51 GMT

I read the author response and other reviews. The author response provides nice additional demonstration about the implication of connecting the two problems via inverse stability. This is an interesting and potentially important paper for a future research on this topic. This paper explains the definition of the inverse stability, proves its implication for neural network optimization, provides failure modes of having the inverse stability, and proves the inverse stability for a simple one-hidden layer network with a single output. Originality: The paper definitely provides a very interesting and unique research direction.

inverse stability, neural network, relu activation function, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

How degenerate is the parametrization of neural networks with the ReLU activation function?

Neural Information Processing SystemsOct-9-2024, 11:04:38 GMT

Neural network training is usually accomplished by solving a non-convex optimization problem using stochastic gradient descent. Although one optimizes over the networks parameters, the main loss function generally only depends on the realization of the neural network, i.e. the function it computes. Studying the optimization problem over the space of realizations opens up new ways to understand neural network training. In particular, usual loss functions like mean squared error and categorical cross entropy are convex on spaces of neural network realizations, which themselves are non-convex. Approximation capabilities of neural networks can be used to deal with the latter non-convexity, which allows us to establish that for sufficiently large networks local minima of a regularized optimization problem on the realization space are almost optimal.

neural network, parametrization, realization, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

Add feedback

How degenerate is the parametrization of neural networks with the ReLU activation function?

Elbrächter, Dennis Maximilian, Berner, Julius, Grohs, Philipp

Neural Information Processing SystemsMar-18-2020, 23:46:34 GMT

Neural network training is usually accomplished by solving a non-convex optimization problem using stochastic gradient descent. Although one optimizes over the networks parameters, the main loss function generally only depends on the realization of the neural network, i.e. the function it computes. Studying the optimization problem over the space of realizations opens up new ways to understand neural network training. In particular, usual loss functions like mean squared error and categorical cross entropy are convex on spaces of neural network realizations, which themselves are non-convex. Approximation capabilities of neural networks can be used to deal with the latter non-convexity, which allows us to establish that for sufficiently large networks local minima of a regularized optimization problem on the realization space are almost optimal.

neural network, parametrization, realization, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

Add feedback

Collaborating Authors

inverse stability

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

b0ae046e198a5e43141519868a959c74-Paper-Conference.pdf

How degenerate is the parametrization of neural networks with the ReLU activation function?

04115ec378e476c56d19d827bcf8db56-AuthorFeedback.pdf

How degenerate is the parametrization of neural networks with the ReLU activation function?

How degenerate is the parametrization of neural networks with the ReLU activation function?

04115ec378e476c56d19d827bcf8db56-AuthorFeedback.pdf

b0ae046e198a5e43141519868a959c74-Paper-Conference.pdf

Reviews: How degenerate is the parametrization of neural networks with the ReLU activation function?

How degenerate is the parametrization of neural networks with the ReLU activation function?

How degenerate is the parametrization of neural networks with the ReLU activation function?